Effect of thesaurus size on schema matching quality

نویسندگان

  • Thabit Sabbah
  • Ali Selamat
  • Mahmood Ashraf
  • Tutut Herawan
چکیده

Thesaurus is used in many Information Retrieval (IR) applications such as data integration, data warehousing, semantic query processing and schema matching. Schema matching or mapping is one of the most important basic steps in data integration. It is the process of identifying the semantic correspondence or equivalent between two or more schemas. Considering the fact of the existence of many thesauri for identical knowledge domain, the quality and the change in the results of schema matching when using different thesauri in specific knowledge field are not predictable. In this research, we studied the effect of thesaurus size on schema matching quality by conducting many experiments using different thesauri. In addition, a new method in calculating the similarity between vectors extracted from thesaurus database is proposed. The method is based on the ratio of individual shared elements to the elements in the compound set of the vectors. Moreover, we explained in details the efficient algorithm used in searching thesaurus database. After describing the experiments, results that show enhancement in the average of the similarity is presented. The completeness, effectiveness, and their harmonic mean measures were calculated to quantify the quality of matching. Experiments on two different thesauri show positive results with average Precision of 35% and a less value in the average of Recall. The effect of thesaurus size on the quality of matching was statically insignificant; however, other factors affecting the output and the exact value of change are still in the focus of our future study. 2014 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic generation of probabilistic relationships for improving schema matching

Schema matching is the problem of finding relationships among concepts across data sources that are heterogeneous in format and in structure. Starting from the ‘‘hidden meaning’’ associated with schema labels (i.e. class/attribute names), it is possible to discover lexical relationships among the elements of different schemata. In this work, we propose an automatic method aimed at discovering p...

متن کامل

An Improved Semantic Schema Matching Approach

Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...

متن کامل

Generic Schema Matching With Cupid Jayant Madhavan

ACM Trans. Database Syst. 30(2), 2005, 624-660. PDF, Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Generic Schema Matching with Cupid VLDB'01 fragment research considering keith fixed jayant survey EVALUATION application 425 van 77–94 key matching printed christian errorm molina likelihood dynamic following arrow parent data madhavan heuristic strategie staab quantitative yanni recall 1000 ...

متن کامل

Schema label normalization for improving schema matching

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the “hidden meaning” associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical r...

متن کامل

Schema Normalization for Improving Schema Matching

Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 71  شماره 

صفحات  -

تاریخ انتشار 2014